Workshop on Shared Tasks and Comparative Evaluation in Natural Language Generation
نویسندگان
چکیده
Today’s NLG efforts should be compared against actual human performance, which is fluent and varies randomly and with context. Consequently, evaluations should not be done against a fixed ‘gold standard’ text, and shared task efforts should not assume that they can stipulate the representation of the source content and still let players generate the diversity of texts that the real world calls for. 1 Minimal competency The proper point of reference when making an evaluation of the output of a natural language generation (NLG) system is the output of a person. With the exception of the occasional speech error or other predicable disfluencies such as stuttering or restarts, people speak with complete command of their grammar (not to mention their culturally attuned prosodics), and with complete command of their discourse context as it shapes the coherence of what they say and the cohesion of how they say it. Any NLG system today that does not use pronouns correctly (assuming they use them at all), that does not reduce complex NPs when they describe subsequent references to entities already introduced into the discourse, that does not reduce clauses with common subjects when they are conjoined, or that fails to use any of the other ordinary cohesive techniques available to them in the language they are using is simply not in the running. Human-level fluency is the entrance ticket to any comparative evaluation of NLG systems.
منابع مشابه
Shared-Task Evaluations in HLT: Lessons for NLG
While natural language generation (NLG) has a strong evaluation tradition, in particular in userbased and task-oriented evaluation, it has never evaluated different approaches and techniques by comparing their performance on the same tasks (shared-task evaluation, STE). NLG is characterised by a lack of consolidation of results, and by isolation from the rest of NLP where STE is now standard. I...
متن کاملTowards the Evaluation of Referring Expression Generation
The Natural Language Generation community is currently engaged in discussion as to whether and how to introduce one or several shared evaluation tasks, as are found in other fields of Natural Language Processing. As one of the most welldefined subtasks in NLG, the generation of referring expressions looks like a strong candidate for piloting such shared tasks. Based on our earlier evaluation of...
متن کاملBound Morpheme Frequencies in the Performance of Iranian English Language Undergraduates and English Language Materials Developers in Written Descriptive Tasks
This mini-corpus, cross-linguistic, comparative, and norm-referenced study intends to render the most frequently and oft-used affixes in the written descriptive tasks in the performance of English language materials developers (ELMDs) and Iranian English language undergraduates (IELUs). Samples of writings of both groups were studied and analyzed through affixation principles. The frequency of ...
متن کامل